Cell-DROS: A Fast Outlier Detection Method for Big Datasets
نویسندگان
چکیده
Outlier detection is one of the obstacles of big dataset analysis because of its time consumption issues. This paper proposes a fast outlier detection method for big datasets, which is a combination of cell-based algorithms and a ranking-based algorithm with various depths. A cell-based algorithm is proposed to transform a very large dataset to a fairly small set of weighted cells based on predefined lower and upper bounds. A ranking-based algorithm with various depths is modified and applied on weighted cells to calculate outlier scores and sort cells based on their outlier scores. Finally, an outlier obtaining algorithm is proposed to identify ids of outliers from the ranked cells and eliminate outliers from the given datasets. Experiment results show that the proposed method can produce the same results when compared to the previous rank-difference outlier detection algorithm but it can reduce up to 99% of executing time.
منابع مشابه
Fast Data Clustering and Outlier Detection Using K-means Clustering on Apache Spark
The components forming the information society nowadays are seen in all areas of our lives. As computers have a great deal of importance in our lives, the amount of information has begun to gather meaningful and specific qualities. Not only the amount of information is increased, but also the speed of access to information has increased. Large data is the transformed form of all data recovered ...
متن کاملOutlier Detection on Mixed-Type Data: An Energy-Based Approach
Outlier detection amounts to finding data points that differ significantly from the norm. Classic outlier detection methods are largely designed for single data type such as continuous or discrete. However, real world data is increasingly heterogeneous, where a data point can have both discrete and continuous attributes. Handling mixed-type data in a disciplined way remains a great challenge. I...
متن کاملFast and Scalable Outlier Detection with Approximate Nearest Neighbor Ensembles
Popular outlier detection methods require the pairwise comparison of objects to compute the nearest neighbors. This inherently quadratic problem is not scalable to large data sets, making multidimensional outlier detection for big data still an open challenge. Existing approximate neighbor search methods are designed to preserve distances as well as possible. In this article, we present a highl...
متن کاملCURIO: A Fast Outlier and Outlier Cluster Detection Algorithm for Large Datasets
Outlier (or anomaly) detection is an important problem for many domains, including fraud detection, risk analysis, network intrusion and medical diagnosis, and the discovery of significant outliers is becoming an integral aspect of data mining. This paper presents CURIO, a novel algorithm that uses quantisation and implied distance metrics to provide a fast algorithm that is linear for the numb...
متن کاملFast outlier detection using rough sets theory
In many Knowledge Discovery applications, finding outliers is more interesting than finding inliers in a dataset. The perception of outliers is rare cases in dataset in which is being described as abnormal data in the information table. Outliers detections are applied in many important applications like fraud detection systems to uncover the suspicious objects which may have important knowledge...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016